43 research outputs found

    Adaptive detection and tracking using multimodal information

    Get PDF
    This thesis describes work on fusing data from multiple sources of information, and focuses on two main areas: adaptive detection and adaptive object tracking in automated vision scenarios. The work on adaptive object detection explores a new paradigm in dynamic parameter selection, by selecting thresholds for object detection to maximise agreement between pairs of sources. Object tracking, a complementary technique to object detection, is also explored in a multi-source context and an efficient framework for robust tracking, termed the Spatiogram Bank tracker, is proposed as a means to overcome the difficulties of traditional histogram tracking. As well as performing theoretical analysis of the proposed methods, specific example applications are given for both the detection and the tracking aspects, using thermal infrared and visible spectrum video data, as well as other multi-modal information sources

    Human motion reconstruction using wearable accelerometers

    Get PDF
    We address the problem of capturing human motion in scenarios where the use of a traditional optical motion capture system is impractical. Such scenarios are relatively commonplace, such as in large spaces, outdoors or at competitive sporting events, where the limitations of such systems are apparent: the small physical area where motion capture can be done and the lack of robustness to lighting changes and occlusions. In this paper, we advocate the use of body-worn wearable wireless accelerometers for reconstructing human motion and to this end we outline a system that is more portable than traditional optical motion capture systems, whilst producing naturalistic motion. Additionally, if information on the person's root position is available, an extended version of our algorithm can use this information to correct positional drift

    Detector adaptation by maximising agreement between independent data sources

    Get PDF
    Traditional methods for creating classifiers have two main disadvantages. Firstly, it is time consuming to acquire, or manually annotate, the training collection. Secondly, the data on which the classifier is trained may be over-generalised or too specific. This paper presents our investigations into overcoming both of these drawbacks simultaneously, by providing example applications where two data sources train each other. This removes both the need for supervised annotation or feedback, and allows rapid adaptation of the classifier to different data. Two applications are presented: one using thermal infrared and visual imagery to robustly learn changing skin models, and another using changes in saturation and luminance to learn shadow appearance parameters

    An improved spatiogram similarity measure for robust object localisation

    Get PDF
    Spatiograms were introduced as a generalisation of the commonly used histogram, providing the flexibility of adding spatial context information to the feature distribution information of a histogram. The originally proposed spatiogram comparison measure has significant disadvantages that we detail here. We propose an improved measure based on deriving the Bhattacharyya coefficient for an infinite number of spatial-feature bins. Its advantages over the previous measure and over histogram-based matching are demonstrated in object tracking scenarios

    A hybrid method for indoor user localisation

    Get PDF
    In this work we describe an approach to indoor user localisation by combining image-based and RF-based methods and compare this new approach to prior work. This paper details a new algorithm for indoor user localisation, demonstrating more effective user localisation than prior approaches and therefore presents the next step in combining two different technologies for localisation in indoor type environments

    TennisSense: a platform for extracting semantic information from multi-camera tennis data

    Get PDF
    In this paper, we introduce TennisSense, a technology platform for the digital capture, analysis and retrieval of tennis training and matches. Our algorithms for extracting useful metadata from the overhead court camera are described and evaluated. We track the tennis ball using motion images for ball candidate detection and then link ball candidates into locally linear tracks. From these tracks we can infer when serves and rallies take place. Using background subtraction and hysteresis-type blob tracking, we track the tennis players positions. The performance of both modules is evaluated using ground-truthed data. The extracted metadata provides valuable information for indexing and efficient browsing of hours of multi-camera tennis footage and we briefly illustrative how this data is used by our tennis-coach playback interface

    Multispectral object segmentation and retrieval in surveillance video

    Get PDF
    This paper describes a system for object segmentation and feature extraction for surveillance video. Segmentation is performed by a dynamic vision system that fuses information from thermal infrared video with standard CCTV video in order to detect and track objects. Separate background modelling in each modality and dynamic mutual information based thresholding are used to provide initial foreground candidates for tracking. The belief in the validity of these candidates is ascertained using knowledge of foreground pixels and temporal linking of candidates. The transferable belief model is used to combine these sources of information and segment objects. Extracted objects are subsequently tracked using adaptive thermo-visual appearance models. In order to facilitate search and classification of objects in large archives, retrieval features from both modalities are extracted for tracked objects. Overall system performance is demonstrated in a simple retrieval scenari

    Detection thresholding using mutual information

    Get PDF
    In this paper, we introduce a novel non-parametric thresholding method that we term Mutual-Information Thresholding. In our approach, we choose the two detection thresholds for two input signals such that the mutual information between the thresholded signals is maximised. Two efficient algorithms implementing our idea are presented: one using dynamic programming to fully explore the quantised search space and the other method using the Simplex algorithm to perform gradient ascent to significantly speed up the search, under the assumption of surface convexity. We demonstrate the effectiveness of our approach in foreground detection (using multi-modal data) and as a component in a person detection system

    Automatic camera selection for activity monitoring in a multi-camera system for tennis

    Get PDF
    In professional tennis training matches, the coach needs to be able to view play from the most appropriate angle in order to monitor players' activities. In this paper, we describe and evaluate a system for automatic camera selection from a network of synchronised cameras within a tennis sporting arena. This work combines synchronised video streams from multiple cameras into a single summary video suitable for critical review by both tennis players and coaches. Using an overhead camera view, our system automatically determines the 2D tennis-court calibration resulting in a mapping that relates a player's position in the overhead camera to their position and size in another camera view in the network. This allows the system to determine the appearance of a player in each of the other cameras and thereby choose the best view for each player via a novel technique. The video summaries are evaluated in end-user studies and shown to provide an efficient means of multi-stream visualisation for tennis player activity monitoring

    Combining inertial and visual sensing for human action recognition in tennis

    Get PDF
    In this paper, we present a framework for both the automatic extraction of the temporal location of tennis strokes within a match and the subsequent classification of these as being either a serve, forehand or backhand. We employ the use of low-cost visual sensing and low-cost inertial sensing to achieve these aims, whereby a single modality can be used or a fusion of both classification strategies can be adopted if both modalities are available within a given capture scenario. This flexibility allows the framework to be applicable to a variety of user scenarios and hardware infrastructures. Our proposed approach is quantitatively evaluated using data captured from elite tennis players. Results point to the extremely accurate performance of the proposed approach irrespective of input modality configuration
    corecore